ITRI - 97 - 12 ” I don ’ t believe in word senses ”
نویسنده
چکیده
Word sense disambiguation assumes word senses. Within the lexicography and linguistics literature, they are known to be very slippery entities. The paper looks at problems with existing accounts of ‘word sense’ and describes the various kinds of ways in which a word’s meaning can deviate from its core meaning. An analysis is presented in which word senses are abstractions from clusters of corpus citations, in accordance with current lexicographic practice. The corpus citations, not the word senses, are the basic objects in the ontology. The corpus citations will be clustered into senses according to the purposes of whoever or whatever does the clustering. In the absence of such purposes, word senses do not exist. Word sense disambiguation also needs a set of word senses to disambiguate between. In most recent work, the set has been taken from a general-purpose lexical resource, with the assumption that the lexical resource describes the word senses of English/French/. . . , between which NLP applications will need to disambiguate. The implication of the paper is, by contrast, that word senses exist only relative to a task.
منابع مشابه
ITRI-97-04 Foreground and Background Lexicons and Word Sense Disambiguation for Information Extraction
متن کامل
ITRI-00-28 What’s in a thesaurus
We first describe four varieties of thesaurus: (1) Roget-style, produced to help people find synonyms when they are writing; (2) WordNet and EuroWordNet; (3) thesauruses produced (manually) to support information retrieval systems; and (4) thesauruses produced automatically from corpora. We then contrast thesauruses and dictionaries, and present a small experiment in which we look at polysemy i...
متن کامل"I Don't Believe in Word Senses"
Word sense disambiguation assumes word senses. Within the lexicography and linguistics literature, they are known to be very slippery entities. The paper looks at problems with existing accounts of ‘word sense’ and describes the various kinds of ways in which a word’s meaning can deviate from its core meaning. An analysis is presented in which word senses are abstractions from clusters of corpu...
متن کاملUsing a Semantic Concordance for Sense Identification
This paper proposes benchmarks for systems of automatic sense identification. A textual corpus in which open-class words had been tagged both syntactically and semantically was used to explore three statistical strategies for sense identification: a guessing heuristic, a most-frequent heuristic, and a co-occurrence heuristic. When no information about sense-frequencies was available, the guessi...
متن کامل